{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading and Saving Data\n",
    "\n",
    "to/from csv, hdf5, jld2, mat and xlsx files."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Load Packages and Extra Functions\n",
    "\n",
    "The packages are loaded in the respective sections below. This allows you to run parts of this notebook without having to install all packages.\n",
    "\n",
    "The data files created by this notebook are written to and loaded from the subfolder \"Results\"."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "using Dates\n",
    "include(\"printmat.jl\")\n",
    "\n",
    "if !isdir(\"Results\")\n",
    "    error(\"create the subfolder Results before running this program\")\n",
    "end"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading a csv File\n",
    "\n",
    "The csv (\"comma-separated values\") format provides a simple and robust method for moving data, and it can be read by most software.\n",
    "\n",
    "\n",
    "For instance, for *reading* a data file delimited by comma (,) and where the first line of the file contains variable names, then use the following\n",
    "```\n",
    "(x,header) = readdlm(FileName,',',header=true)\n",
    "```\n",
    "Alternatively, use\n",
    "```\n",
    "x = readdlm(FileName,',',skipstart=1)\n",
    "```\n",
    "to disregard the first line.\n",
    "\n",
    "Extra arguments control the type of data (Float64, Int, etc), suppression of comment lines and more.\n",
    "\n",
    "If you need more powerful write/read routines, try the [CSV package](https://github.com/JuliaData/CSV.jl)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "header of csv file:\n",
      "         X         Y         Z\n",
      "\n",
      "x:\n",
      "     1.100     1.200     1.300\n",
      "     2.100     2.200     2.300\n",
      "\n"
     ]
    }
   ],
   "source": [
    "using DelimitedFiles\n",
    "\n",
    "(x,header) = readdlm(\"Data/CsvFile.csv\",',',header=true)  #read csv file\n",
    "\n",
    "println(\"header of csv file:\")\n",
    "printmat(header)\n",
    "println(\"x:\")\n",
    "printmat(x)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Saving a csv File\n",
    "\n",
    "To *write* csv data, the simplest approach is to create the matrix you want to save and then run\n",
    "```\n",
    "writedlm(FileName,matrix)\n",
    "```\n",
    "Alternatively, to write several matrices to the file (without having to first combine them), use\n",
    "```\n",
    "fh = open(Filename, \"w\")\n",
    "    writedlm(fh,matrix1,',')\n",
    "    writedlm(fh,matrix2,',')\n",
    "close(fh)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\u001b[34m\u001b[1mNewCsvFile.csv has been created in the subfolder Results. Check it out.\u001b[22m\u001b[39m\n"
     ]
    }
   ],
   "source": [
    "x   = [1.1 1.2 1.3;\n",
    "       2.1 2.2 2.3]\n",
    "header = [\"X\" \"Y\" \"Z\"]\n",
    "\n",
    "xx = Any[header; x]                         #to save \n",
    "writedlm(\"Results/NewCsvFile.csv\",xx,',')  #write csv file\n",
    "\n",
    "printblue(\"NewCsvFile.csv has been created in the subfolder Results. Check it out.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading csv with Dates and Missing Values (extra)\n",
    "\n",
    "The next cells show how to load a csv files with dates (for instance, 15/01/1979) and some missing values. \n",
    "\n",
    "The code \n",
    "\n",
    "1. reads the csv file\n",
    "2. converts `x2[:,1]` to Date,\n",
    "3. finds all elements in `x = x2[:,2:end]` that are not numbers and converts them to NaN"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "readdlmFix"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "\"\"\"\n",
    "Change elements with missing data (' ') to either NaN or missing. \n",
    "`x` is the input matrix, `Typ` is the type of the output (Float64, Int, etc) and \n",
    "`missval` is your choice of either `NaN` or `missing`\n",
    "\"\"\"\n",
    "function readdlmFix(x,Typ=Float64,missVal=NaN)\n",
    "    y = replace(z->!isa(z,Number) ? missVal : z,x)\n",
    "    ismissing(missVal) && (Typ = Union{Missing,Typ})   #allow missing\n",
    "    y = convert.(Typ,y)\n",
    "  return y\n",
    "end"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "dates and data (first 4 obs):\n",
      "1979-01-02    96.730          \n",
      "1979-01-03               9.310\n",
      "1979-01-04    98.580     9.310\n",
      "1979-01-05    99.130     9.340\n",
      "\n",
      "after fix of missing data (first 4 obs):\n",
      "1979-01-02    96.730       NaN\n",
      "1979-01-03       NaN     9.310\n",
      "1979-01-04    98.580     9.310\n",
      "1979-01-05    99.130     9.340\n",
      "\n"
     ]
    }
   ],
   "source": [
    "x2 = readdlm(\"Data/CsvFileWithDates.csv\",',',skipstart=1)\n",
    "dN = Date.(x2[:,1],\"d/m/y\")     #to Date, \"d/m/y\" is the date format in the file\n",
    "x  = x2[:,2:end]                #the data, but Any[] since missing data\n",
    "\n",
    "println(\"dates and data (first 4 obs):\")\n",
    "printmat(Any[dN[1:4] x[1:4,:]])\n",
    "\n",
    "x = readdlmFix(x)\n",
    "\n",
    "println(\"after fix of missing data (first 4 obs):\")\n",
    "printmat(Any[dN[1:4] x[1:4,:]])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading and Saving hdf5\n",
    "\n",
    "hdf5 files are used in many computer languages. They can store different types of data: integers, floats, strings (but not Julia Dates).\n",
    "\n",
    "The basic syntax of the [HDF5 package](https://github.com/JuliaIO/HDF5.jl) is \n",
    "```\n",
    "fh = h5open(FileName,\"r\")   #open for reading\n",
    "    (x,y) = read(fh,\"x\",\"y\")\n",
    "close(fh)\n",
    "\n",
    "fh = h5open(FileName,\"w\")   #open for writing\n",
    "    write(fh,\"x\",x)\n",
    "    write(fh,\"y\",y)\n",
    "close(fh)\n",
    "```\n",
    "\n",
    "To save dates, save either a matrix `[y m d]` (see eg. `Dates.month(date)`) or a date value (see eg. `Dates.value(date)`). \n",
    "\n",
    "See https://support.hdfgroup.org/products/java/hdfview/\n",
    "for a program that lets you look at the contents of a hdf5 file. (It is not needed here.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Variables in h5 file: [\"B\", \"C\", \"x\", \"ymd\"]\n",
      "\n",
      "dates and x from h5 file is\n",
      "2019-05-14     1.100     1.200     1.300\n",
      "2019-05-15     2.100     2.200     2.300\n",
      "\n"
     ]
    }
   ],
   "source": [
    "using HDF5\n",
    "\n",
    "fh = h5open(\"Data/H5File.h5\",\"r\")                     #open for reading\n",
    "     println(\"\\nVariables in h5 file: \",names(fh))\n",
    "    (x,B,ymd) = read(fh,\"x\",\"B\",\"ymd\")                #load some of the data\n",
    "close(fh)\n",
    "\n",
    "dN = Date.(ymd[:,1],ymd[:,2],ymd[:,3])                #reconstructing dates\n",
    "\n",
    "println(\"\\ndates and x from h5 file is\")\n",
    "printmat(Any[dN x])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NewH5File.h5 has been created in the subfolder Results\n"
     ]
    }
   ],
   "source": [
    "x   = [1.1 1.2 1.3;\n",
    "       2.1 2.2 2.3]\n",
    "ymd = [2019 5 14;\n",
    "       2019 5 15]\n",
    "B   = 1\n",
    "C   = \"Nice cat\"\n",
    "\n",
    "fh = h5open(\"Results/NewH5File.h5\",\"w\")    #open file for writing   \n",
    "    write(fh,\"x\",x)\n",
    "    write(fh,\"ymd\",ymd)\n",
    "    write(fh,\"B\",B)\n",
    "    write(fh,\"C\",C)\n",
    "close(fh)                                  #close file\n",
    "\n",
    "println(\"NewH5File.h5 has been created in the subfolder Results\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading and Saving jld2\n",
    "\n",
    "jld2 files  can store very different types of data: integers, floats, strings, dictionaries, etc. It is a dialect of hdf5, designed to save different Julia objects (including Dates). \n",
    "\n",
    "The basic syntax of the [JLD2 package](https://github.com/simonster/JLD2.jl) is \n",
    "```\n",
    "(A,B) = load(FileName,\"A\",\"B\")        #load some data \n",
    "xx = load(\"Data/NewJldFile.jld2\")     #load all data into a Dict()\n",
    "save(FileName,\"A\",A,\"B\",B)            #save data\n",
    "```\n",
    "(It also possible to use the same syntax as for HDF5, except that we use ```jldopen``` instead of ```h5open```.)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The variables are: [\"B\", \"C\", \"x\", \"d\"]\n",
      "\n",
      "dates and x from jld2 file is\n",
      "2019-05-14     1.100     1.200     1.300\n",
      "2019-05-15     2.100     2.200     2.300\n",
      "\n"
     ]
    }
   ],
   "source": [
    "using FileIO, JLD2           #the FileIO package is also needed\n",
    "\n",
    "xx = load(\"Data/JldFile.jld2\")                    #load entire file\n",
    "println(\"The variables are: \",keys(xx))           #list contents of the file \n",
    "\n",
    "(x,d) = load(\"Data/JldFile.jld2\",\"x\",\"d\")          #read some of the data\n",
    "\n",
    "println(\"\\ndates and x from jld2 file is\")\n",
    "printmat(Any[dN x])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "NewJldFile.jld2 has been created in the subfolder Results\n"
     ]
    }
   ],
   "source": [
    "x   = [1.1 1.2 1.3;\n",
    "       2.1 2.2 2.3]\n",
    "d   = [Date(2019,5,14);                                #Julia dates\n",
    "       Date(2019,5,15)]\n",
    "B   = 1\n",
    "C   = \"Nice cat\"\n",
    "\n",
    "save(\"Results/NewJldFile.jld2\",\"x\",x,\"d\",d,\"B\",B,\"C\",C)       #write jld file\n",
    "\n",
    "println(\"NewJldFile.jld2 has been created in the subfolder Results\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Saving and Loading Matlab mat files (extra)\n",
    "\n",
    "The [MAT package](https://github.com/JuliaIO/MAT.jl) allows you to load/save (Matlab) mat files (which is a dialect of HDF5)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Variables in mat file: [\"B\", \"C\", \"dM\", \"x\"]\n",
      "\n",
      "dates and x from mat file is\n",
      "2019-05-14     1.100     1.200     1.300\n",
      "2019-05-15     2.100     2.200     2.300\n",
      "\n"
     ]
    }
   ],
   "source": [
    "using MAT\n",
    "\n",
    "function DateMLtoDate(dNum)         #Matlab datenum to Julia date\n",
    "  dNum   = round.(Int,dNum) .- 366\n",
    "  dNTime = Dates.rata2datetime.(dNum)\n",
    "  dN     = Date.(dNTime)\n",
    "  return dN\n",
    "end\n",
    "\n",
    "\n",
    "fh = matopen(\"Data/MatFile.mat\")\n",
    "    println(\"\\nVariables in mat file: \",names(fh))\n",
    "    (x,dM) = read(fh,\"x\",\"dM\")\n",
    "close(fh) \n",
    "\n",
    "d = DateMLtoDate(dM)                #Matlab datenum to Julia date\n",
    "\n",
    "println(\"\\ndates and x from mat file is\")\n",
    "printmat(Any[d x])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "NewMatFile.mat has been created in the subfolder Results\n"
     ]
    }
   ],
   "source": [
    "x   = [1.1 1.2 1.3;\n",
    "       2.1 2.2 2.3]\n",
    "d   = [Date(2019,5,14);                         #Julia dates\n",
    "       Date(2019,5,15)]\n",
    "B   = 1\n",
    "C   = \"Nice cat\"\n",
    "\n",
    "dM  = Dates.value.(d) .+ 366.0  #Julia Date to Matlab's datenum(), Float64\n",
    "\n",
    "fh = matopen(\"Results/NewMatFile.mat\",\"w\")\n",
    "    write(fh,\"x\",x)             #write one variable at a time\n",
    "    write(fh,\"B\",B)\n",
    "    write(fh,\"dM\",dM)\n",
    "    write(fh,\"C\",C)\n",
    "close(fh)\n",
    "\n",
    "println(\"\\nNewMatFile.mat has been created in the subfolder Results\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading an xls File\n",
    "\n",
    "The [XLSX package](https://github.com/felipenoris/XLSX.jl) allows you to read and write xls (and xlsx) files. \n",
    "\n",
    "As an alternative, you can use [ExcelReaders](https://github.com/davidanthoff/ExcelReaders.jl), which requires python and python's xlrd libarary. For instance, this would work\n",
    "```\n",
    "using ExcelReaders\n",
    "data1 = readxl(\"Data/XlsFile.xlsx\",\"Data!B2:C11\")\n",
    "x1    = convert.(Float64,data1)            \n",
    "printmat(x1)\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "part of the xlsx file:\n",
      "    16.660  -999.990\n",
      "    16.850  -999.990\n",
      "    16.930  -999.990\n",
      "    16.980  -999.990\n",
      "    17.080  -999.990\n",
      "    17.030     7.000\n",
      "    17.090     8.000\n",
      "    16.760  -999.990\n",
      "    16.670  -999.990\n",
      "    16.720  -999.990\n",
      "\n"
     ]
    }
   ],
   "source": [
    "using XLSX\n",
    "\n",
    "data1 = XLSX.readxlsx(\"Data/XlsFile.xlsx\")   #reading the entire file\n",
    "x1    = data1[\"Data!B2:C11\"]                 #extracting a part of the sheet \"Data\"\n",
    "x1    = convert.(Float64,x1)                 #converting from Any to Float64\n",
    "\n",
    "println(\"part of the xlsx file:\")\n",
    "printmat(x1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Creating Variables from Variable Names (extra)\n",
    "\n",
    "Suppose you have \n",
    "\n",
    "1. a matrix `x` with data\n",
    "2. a list of the variable names for each column of `x`, for instance, from `header` in a CSV file or saved as a vector of strings in a hdf5/mat/jld2 file.\n",
    "\n",
    "If there are few variables, then you can manually create each of them from the loaded matrix. This becomes tedious when there are many variables.\n",
    "\n",
    "However, it is easy to create a Dict() which can be used to easily refer to the variable names, for instance, `D[:X]` to get variable `X`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x[:,1], X and D[:X]\n",
      "     1.100     1.100     1.100\n",
      "     2.100     2.100     2.100\n",
      "\n"
     ]
    }
   ],
   "source": [
    "(x,header) = readdlm(\"Data/CsvFile.csv\",',',header=true)\n",
    "n = size(x,2)\n",
    "\n",
    "(X,Y,Z) = [x[:,i] for i=1:n]                         #manually creating X,Y,Z\n",
    "\n",
    "D = Dict([(Symbol(header[i]),x[:,i]) for i=1:n])     #Creating D with :X,:Y,:Z\n",
    "\n",
    "println(\"x[:,1], X and D[:X]\")\n",
    "printmat([x[:,1] X D[:X]])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "@webio": {
   "lastCommId": null,
   "lastKernelId": null
  },
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Julia 1.4.0",
   "language": "julia",
   "name": "julia-1.4"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.4.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
